FIGURE 1 



AGTCCGAGACGGGCTTTTCCCAGAGAGCTAAAAGAGAAGGGCCAGAGAATGTCGTCCCAG 

CCAGCAGGGAACCAGACCTCCCCCGGGGCCACAGAGGACTACTCCTATGGCAGCTGGTAC 

ATCGATGAGCCCCAGGGGGGCGAGGAGCTCCAGCCAGAGGGGGAAGTGCCCTCCTGCCAG 

ACCAGCATACCACCCGGCCTGTACCACGCCTGCCTGGCCTCGCTGTCAATCCTTGTGCTG 

CTGCTCCTGGGCATGCTGGTGAGGCGCCGCCAGCTCTGGCCTGACTGTGTGCGTGGCAGG 

CCCGGCCTGCCCAGCCCTGTGGATTTCTTGGCTGGGGACAGGCCCCGGGCAGTGCCTGCT 

GCTGTTTTCATGGTCCTCCTGAGGTCCCTGTGTTTGCTGCTCCCCGACGAGGACGCATTG 

CCCTTCCTGACTCTCGCCTCAGCACCCAGCCAAGATGGGAAAACTGAGGCTCCAAGAGGG 

GCCTGGAAGATACTGGGACTGTTCTATTATGCTGCCCTCTACTACCCTCTGGCTGCCTGT 

GCCACGGCTGGCCACACAGCTGCACACCTGCTCGGCAGCACGCTGTCCTGGGCCCACCTT 

GGGGTCCAGGTCTGGCAGAGGGCAGAGTGTCCCCAGGTGCCCAAGATCTACAAGTACTAC 

TCCCTGCTGGCCTCCCTGCCTCTCCTGCTGGGCCTCGGATTCCTGAGCCTTTGGTACCCT 

GTGCAGCTGGTGAGAAGCTTCAGCCGTAGGACAGGAGCAGGCTCCAAGGGGCTGCAGAGC 

AGCTACTCTGAGGAATATCTGAGGAACCTCCTTTGCAGGAAGAAGCTGGGAAGCAGCTAC 

CACACCTCCAAGCATGGCTTCCTGTCCTGGGCCCGCGTCTGCTTGAGACACTG'CATCTAC 

ACTCCACAGCCAGGATTCCATCTCCCGCTGAAGCTGGTGCTTTCAGCTACACTGACAGGG 

ACGGCCATTTACCAGGTGGCCCTGCTGCTGCTGGTGGGCGTGGTACCCACTATCCAGAAG 

GTGAGGGCAGGGGTCACGACGGATGTCTCCTACCTGCTGGCCGGCTTTGGAATCGTGCTC 

TCCGAGGACAAGCAGGAGGTGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTGGAAGTG 

TGCTACATCTCAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTCCTGATGCGCTCA 

CTGGTGACACACAGGACCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGACTTGAGT 

CCCTTGCATCGGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGCTTCAGT 

GCCTACCAGACAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTCTTCCTG 

GGAACCACGGCCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAACCTCCTG 

CTCTTCCGTTCCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCTGTGATC 

CTGCAGAACATGGCAGCCCATTGGGTCTTCCTGGAGACTCATGATGGACACCCACAGCTG 

ACCAACCGGCGAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCCCTCAATGTGCTGGTG 

GGTGCCATGGTGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGCCATCCACCTT 

GGCCAGATGGACCTCAGCCTGCTGCCACCGAGAGCCGCCACTCTCGACCCCGGCTACTAC 

ACGTACCGAAACTTCTTGAAGATTGAAGTCAGCCAGTCGCATCCAGCCATGACAGCCTTC 

TGCTCCCTGCTCCTGCAAGCGCAGAGCCTCCTACCCAGGACCATGGCAGCCCCCCAGGAC 

AGCCTCAGACCAGGGGAGGAAGACGAAGGGATGCAGCTGCTACAGACAAAGGACTCCATG 

GCCAAGGGAGCTAGGCCCGGGGCCAGCCGCGGCAGGGCTCGCTGGGGTCTGGCCTACACG 

CTGCTGCACAACCCAACCCTGCAGGTCTTCCGCAAGACGGCCCTGTTGGGTGCCAATGGT 

GCCCAGCCCTGAGGGCAGGGAAGGTCAACCCACCTGCCCATCTGTGCTGAGGCATGTTCC 

TGCCTACCATCCTCCTCCCTCCCCGGCTCTCCTCCCAGCATCACA.CCAGCCATGCAGCCA 

GCAGGTCCTCCGGATCACTGTGGTTGGGTGGAGG-^XTTGTCTGCACTGGGAGCCTCAGGAG 

GGCTCTGCTCCACCCACTTGGCTATGGGAGAGC'CAGCAGGGGTTCTGGAGAAAAAAACTG 

GTGGGTTAGGGCCTTGGTCCAGGAGCCAGTTGAGCCAGGGCAGCCACATCCAGGCGTCTC 

CCTACCCTGGCTCTGCCATCAGCCTTGAAGGGCCTCGATGAAGCCTTCTCTGGAACCACT 

CCAGCCCAGCTCCACCTCAGCCTTGGCCTTCACGCTGTGGAAGCAGCCAAGGCACTTCCT 

CACCGCCTCAGCGCCACGGACCTCTCTGGGGAGTGGCCGGAAAGCTCCCGGTCCTCTGGC 

CTGCAGGGCAGCCCAAGTCATGACTCAGACCAGGTCCCACACTGAGCTGCCCACACTCGA 

GAGCCAGATATTTTTGTAGTTTTTATGCCTTTGGCTATTATGAAAGAGGTTAGTGTGTTC 

CCTGCAA TAAA CTTGTTCGTG AG AAAAAAAAJVAAAAAAAAAAAAAAAAAAAAAAAAAAAA 

AAAAAAAJWUWU^AAAAAAAJWLAAAAAAAA 



FTGXJKE 2 



• nSSOPAGN0TSPGATEDVSYGSV7YIDEPOGGEELQPEGEVpSCHTSIPPGLYHACLASl,S 
JLVbbLbAMbVRRRQbWPDCVRGRPGLPSPVDFl^GDR^ 

EDAbPFLTLASAPSOTCKTEAPRGAWKIbGbFrYAALyyPLAACATAGHTAAHbbGSTLS 
5 WAHUGVQWQRAECPOVPKiyOySbbASLPLLLGLGFLSLWypVOLVRSFSRRTGAGSK 
GbQSSYSEEyL^bLCRKKbGSSyHTSKHGFLSWAPVCbRHClYTPQPGFHLPbKbVUSA 
TbTGTAI y QVflLLbLVGWPTI QKVRAGVTTPVS YbbAGFGl VbSEDKQEVVEbVKHHbW 
ALEVCY I SAJUVLSCLbTFbVbMRSLVTHRTNLRAbHRGAALDbSPLHRSPHPSRQAI FCW 
MSFS AY QTAFI CbGbbVQQI I FF1>GTTALAFLVLMPVLHGRWLL1,FRSLESSWPFWLTLA 

1 0 LAV I lX?NriAAHVrVFLETHIX?HPC:bT^ 

AIHLGOMDLSLLPPRAATLDPGyyTYR^FbKJEVSOSHPAMTAFCSbLLOAOSLLPRTMA 
APODSLRPGEEDEGMQLLOTKDSMAKGARPGASRGRARWGbAYTbLHNPTLQVFRKTAib 

GANGAQP 

Important features of the protein: 
15 Signal peptide: 

None 

Transmembrane domain: 

20 

54-69 
302- 3 1 9 
34 8-3 66 
207 - 222 
25 303-320 
364-380 
433-453 
474-4 89 
560-535 

30 

Mot if file: 

Motif name: N - gl y cosy 1 a t i on site- 
s' 32 

35 

Motif name: N-myr i stoyl at ion site. 

50-56 

176-382 
40 243 -247 

337-323 

341-347 

525-531 

627-633 
45 631-637 

640-646 

661-667 

Motif name: Prokaryotic membrane lipoprotein lipid attachment site. 

50 

364-375 

Motif name: ATP/GTP- bi nding site motif A (P-loop) . 



55 



132-340 



FIGURE 3A 
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PRO XXXXXXXXXXXXXXX (Length = 15 amino acids) 

Comparison Protein XXXXXYYYYYYY (Length = 12 amino acids) 

% amino acid sequence identity = 



(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by ALIGN-2) divided by (the total number of amino acid residues 
10 of the PRO polypeptide) = 



5 divided by J 5 = 33.3% 



FIGURE 3B 



PRO XXXXXXXXXX (Length = 10 amino acids) 

Comparison Protein XXXXXYYYYYYZZYZ (Length = 15 amino acids) 

% amino acid sequence identity = 

(the number of identically matching amino acid residues between the two polypeptide 
sequences as determined by AL1GN-2) divided by (the total number of amino acid residues 
of the PRO polypeptide) = 



5 divided by 10 = 50% 
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PRODNA NNNNNNNN'NNNNNN (Length = 14 

nucleotides) 

5 Comparison DNA NNNNNNLLLLLLLLLL (Length = 16 

nucleotides) 

% nucleic acid sequence identity = 

10 (the number of identically matching nucleotides between the two nucleic acid sequences as 
determined by AL1GN-2) divided by (the total number of nucleotides of the PRO DNA 
nucleic acid sequence) = 

6 divided by J4 = 42.9% 

IS 
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PRO-DNA NNNNNNNNNNNN (Length = 12 nucleotides) 

Comparison DNA NNNNLLLVV (Length = 9 nucleotides) 

% nucleic acid sequence identity = 

(the number of identically matching nucleotides between the two nucleic acid sequences as 
determined by AL1GN-2) divided by (the total number of nucleotides of the PRO-DNA 
nucleic acid sequence) = 



4 divided by 12 - 33.3% 



FTGURE 4A 
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* C-C increased from 12 to 15 

* Z is average of EQ 

* B *s average of ND 

* match with stop is _M; stop-stop = 0; J (pfcer) match = 0 
*f 

^define M -8 /* value of a match with a stop */ 



int 





/* A -/ 




/* B */ 


15 


/* C */ 




/* D V 




/* E V 




/* F V 




/* G */ 


20 


/* H */ 




/* I */ 




/* ) V 




/* K V 




/* L *l 


25 


/* M V 




/* N V 




/» O V 




O.M.M 




/* P */ 


30 


/* Q */ 




r r */ 




/* S V 




;* J */ 




/* U V 


35 


/» V */ 




/* W V 




/* X v 




/* Y V 




/* Z */ 


40 





_day|26][26) = { 

ABCD. EFGH1JKLMNOPQRSTUVWXYZ*/ 
n « -> n n * i i in i -> i r\ \a 1 n -> i i n n _a n _t n\ 

0,3,-4, 3.2.-5, 0. 1,-2, 0, 0,-3, 2, 2, 1,0, 0, 0. 0,-2,-5,0,-3, I}, 

-2,-4,15, 5,-5,-4,-3,-3,-2, 0,-5.-6.-5.-4,_M,-3,-5,-4, 0,-2, 0,-2.-8. 0, 0,-5), 
0, 3,-5, 4, 3,-6. I, 1,-2, 0, 0,-4, 3, 2,_M.-I. 2,-1, 0, 0, 0,-2.-7, 0,-4, 2}, 

0, 2,-5, 3, 4,-5, 0, 1.-2, 0, 0,-3,-2. l._M,-I, 2,-1, 0, 0, 0.-2,-7, 0,-4, 3}, 
-4,-5,-4,-6,-5, 9,-5,-2. ), 0,-5, 2, 0.-4.M. -5,-5, 4,-3,-3, 0,-1, 0, 0, 7,-5}, 

1, 0,-3, 1, 0,-5, 5,-2,-3, 0,-2,-4,-3, 0,_M.~ I,- 1 ,-3. I, 0, 0,-1,-7, 0,-5, 0}, 
-J, 1,-3, I. 1,-2,-2, 6,-2, 0, 0, 2,-2, 2,_M, 0, 3, 2,- 1,-1, 0,-2,-3, 0, 0, 2), 
-1,-2,-2,-2,-2, 1,-3.-2. 5, 0.-2, 2, 2,-2, M f -2,-2,-2,- 1 . 0, 0, 4,-5, 0,-1,-2}, 

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,_M, 0, 0, 0, 0, 0, 0. 0, 0, 0, 0, 0}. 

1, 0.-5, 0, 0,-5.-2. 0,-2, 0, 5,-3, 0, l._M,-l. L3, 0, 0, 0,-2,-3, 0,-4, 0), 

2, -3.-6,-4,-3. 2.-4,-2, 2, 0.-3. 6, 4,-3._M.-3,-2 f -3,-3.- 1, 0, 2,-2, 0.-1,-2). 
-1.-2,-5,-3.-2, 0,-3,-2, 2, 0, 0, 4, 6,-2,_M,*2.- ! , 0,-2,- 1 , 0, 2,-4, 0.-2,-1), 

0, 2,-4. 2, 1,-4, 0, 2, 2. 0. 1.-3, 2, 2,_M,- 1, 1,0, 1.0, 0,-2.-4, 0.-2, 1}.. 

^_M._M,_M,_M._M,_M._M r _M,_M f _ 

M ,_M ,_M , _ M , _M, M . M , M ,_M } , 

1, -1,-3,-1,-1.-5.-1. 0,-2, 0,-l.-3.-2,-l,_M. 6, 0. 0, 1,0. 0.-1.-6, 0,-5, 0}, 

0, 1,-5. 2, 2.-5.-1. 3,-2, 0, 1,-2.-1, l._M. 0. 4, 1.-1,-1. 0.-2,- 5, 0,-4, 3), 

2, 0,-4.-1,-1,-4,-3, 2,-2, 0, 3,-3, 0, 0. M, 0, 1,6. 0,-1, 0.-2, 2, 0,-4, 0), 

1, 0, 0, 0, 0,-3. 1,1^-1, 0. 0,-3,-2, J,_M. 1.1,0. 2, 1, 0,-1/2, 0.-3, 0} 7 
1. 0;-2, 0, 0.-3, 0,-1. 0, 0, 0,-1, I, 0._M,. 0,- 1 .- 1 , 1. 3, 0, 0,-5, 0.-3, 0). 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, M, 0. 0, 0. 0. 0, 0, 0. 0. 0, 0. 0} . 
o!-2,-2 f -2,-2,- 1,-1,-2. 4, 0,-2, 2, 2,-2._M,- 1 ,-2,-2,- 1 . 0, 0, 4.-6, 0,^2,-2), 
6,-5,-8,-7,-7, 0,-7,-3,-5, 0,-3,-2, 4 ? -4,_M,-6,-5. 2,-2,-5. 0,-6,17. 0, 0,-6), 

o, o. o, o, o, o, o, o, o, o, o, o, o, o,_m, o, o; 0, 0, 0. 0, 0, 0, 0, 0, 0), 

3 t _3, 0,-4.-4, 7,-5, 0,-1, 0.-4,-1,-2.-2, _M,-5,-4.-4,-3,-3. 0,-2, 0. 0.10,-4}, 
0, 1,-5. 2, 3.-5, 0, 2,-2, 0, 0,-2,-1, 1,_M. 0. 3. 0, 0. 0, 0,-2,-6, 0,-4, 4} 



M, M. M, M, M, 



45 



50 



55 



Page 1 of day .h 



FIGURE 4B 





include <stdioh> 








5 


include <ctypeh> 










^define MAXJMP 


16 


y* 


max jumps in a diag */ 




^define MAXGAP 


24 


y* 


don't continue lo penalize gaps large* than this */ 




#def"jne JMPS 


1024 


y* 


max jmps in an path */ 


10 


^define MX 


4 


/* 


save if there's at least MX- J bases since last jmp */ 




^define DM AT 


3 


y* 


value of matching bases */ 




^define DMJS 


0 


/* 


penalry for mismatched bases */ 




^define D1NS0 


8 


y* 


penalty for a gap */ 


15 


^define DINS I 


] 


/* 


penalty per base *l 




^define P1NS0 


8 


/* 


penalty for a gap *f 




^define P1NS1 


4 


/* 


penalty per residue *! 




slruct jmp { 









20 



short n|MAXJMP); /* size of jmp (neg for defy) V 

unsigned short x(MAXJMP); /* base no. of jmp in seq x V 

y* limits seq lo 2 ' 16 - 1 */ 



25 



30 



35 



40 



45 



50 



55 



struct diag { 








inl 


score; 


/* sccie at last jmp */ 




long 


offset; 


1* offset of picv block */ 




short 


ijmp; 


/* curreni jmp index */ 


); 


struct jmp jp; 


/* list of jmps */ 


struct path { 








int 


spc; y* 


number of leading spaces */ 




short 


n|JMPS);/* size of jmp (gap) */ 


}: 


inl 


x|JMPS];/* loc of jmp (lasi elem before gap) */ 


char 




*of»Ie; 


/* output file name */ 


char 




*namex|2]; 


/* seq names: getseqs() */ 


char 




*prog; 


/* prog name for err msgs */ 


char 




*seqx|2J; 


f* seqs: getseqs() */ 


int 




drnax; 


/* best diag: nw() */ 


int 




dmaxO; 


/* final diag */ 


inl 




dna; 


y* set if dna: mainO */ 


int 




endgaps; 


/* sei if penalizing end gaps.*/ 


int 




gap*, gapy; 


/* total gaps in seqs */ 


int 




lenO, lenl; 


/ * seq lens */ 


int 




ngapx. ng3py; 


y* total size of gaps */ 


int 




smax; 


/* max score: hwQ */ 


inl 




*xbm; 


/* bitmap for matching *f 


long 




offset; 


/* curreni offset in jmp file V 


struct 


diag 


*dx; 


f* holds diagonals */ 


si rue! 


path 


PP(2); 


/* holds paih for seqs */ 


char 




*cailoc0. *malloc0, * 


indexQ, *strcpyQ; 


char 




*geiseq(), *g_calloc(); 
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/* Needleman-Wonsch alignment program 

* visage: progs file J file2 

5 * where filel and 0Je2 are two dna or two protein sequences. 

* The sequences can be in upper- or lower-case an may coniain ambiguity 

* Any lines beginning with ';\ *>' or *<* are ignored 

* Max file length is 65535 (limited by unsigned short x in the jmp struct) 

* A sequence with 1/3 or more of its elements ACGTU is assumed to be DNA 
10 * Output is in the Tile "align. out" 

* 

* The program may create a tmp Hie in /imp to hold info about traceback. 

* Original version developed under BSD 4.3 on a vax 8650 
*/ 

15 ^include "rrw.h" 
^include "day.h" 



20 



static dbval[26) = { 

1,14,2,13,0,0,4,11,0,0,12,0,3,15,0,0,0,5,6,8,8,7.9,0,10,0 



25 



_pbval|26] = { 

J,2|(J<<('D--'A'))I(1< <('N'-'A-)), 4, 8, 16, 32, 64, 

128, 256, OxJFFFFFF, K < 10, 1< < II. 1< < 12, I < < 13. 1< < 14, 

1< < 15, 1< < 16, 1< < 17. 1< < 18, 1< < 19, 1< <20, I< <2I, 1< <22. 

1< <23^ 1< <24, 1< <25|(1< < ( E A ))|f I < << Q -'A )> 



30 



m3m(ac T av) 
int 
char 



main 



35 



40 



45 



50 



prog = av|0); 
if (ac ! = 3) { 

fprinif(siderT ."usage: %s filel file2\n*\ prog); 

fprintf (side ft, "where file I and HIe2 aie two dna or iwo proiern sequences. '^n' 7 ); 
rprinif(stderr t "The sequences can be in upper- or lower-case\n"); 
fpnnlf(stderr,"Any lines beginning with ';* or * < ' are ignoredW); 
fprintf(stderr, "Output is in the file .\"3>rgn.out\nn"); 
exh(l); 

} 

namexjO) = av(l); 

n3mex|l) = 3v|2J; 

seqxjOJ = getseq(namex[0], <5tlen0); 

seqxll] = geiseq(namexll], &Ienl); 

xbm = (dna)? dbval : j>bval; 

endgaps = 0; '* ' to penalize endgaps V 

ofile = "align. out"; /* output Hie */ 

nwQ; /* fill in the matrix, get the possible jmps */ 

readjmpsO; /* 8 C| ,r>e actual jmps */ 

printQ; /* PH nt stats, alignment */ 



55 



cleanup(O); 



/* unlink any imp files */ 
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/* do the alignment, return best score: mainO 

* dna: values in Fitch and Smitft, PNAS, 80, 1382-1386, 1983- 

* pro: PAM 250 values 

* When scores are equal, we prefer mismatches to any gap, prefer 

* a new gap to extending an ongoing gap, and prefer a gap in seqx 

* to a gap in seq y. 
*/ 

nwQ tnv 



10 { 

char *px, *py; /* seqs and ptis */ 

int *ndely, *dely; /* keep track of dely */ 

;nt ndetx delx" /* keep track of de !x V 

ini *tmp; /* for swapping rowO, row) */ 

1 5 int mis; /* score for each type */ 

int insO, insl; insertion penalties */ 

register id; >* diagonal index V 

register ij; /* jmp index */ 

regisler *co!0, *co)l; /* score for curr, last row */ 

2 0 register xx, yy; /* index in(o seqs */ 

dx = (struct diag *)g_calloc("io gel diags", lenO+Ienl + l. sizeof(struct diag)), 

ndely = (int *)g cal!oc( *to gel ndely", lenl ■+ ), sizeof(int)); 

2 5 dely = (int *)g_calloc("to get dely", lenl + I , sizeof(int)}; 

colO = (int *)g_caIlocOo get colO", len] -f 1, sizeof(int)); 
coll - (int *)g_calloc("io gci coll", lenl -+ 1, sizeor(inl)), 
insO - (dna)? D1NS0 : P1NS0; 
insl - (dna)? DINSl : PINSI; 

30 

smax = -10000, 
ii (endgaps) { 

for (coI0|0} - de)y(O) = -insO. yy = I ; yy < = lenl; yy + + ) { 
colOjyy) = delylyy) = col0{yy-lj- insl; 

3 5 ndely [yy] = yy; 

} 

coI0|0) = 0; /* Waicrrnan Bull Math Biol 84 */ 

} 

else 

4 0 for (yy = 1; yy < = lenl; yy-f +) 

dely(yy) — -insO; 

/* fill in match matrix 

4 5 for (px - seqxlO), xx = 1 ; xx < = lenO; px + + , xx + + ) { 

/* initialize first entry in col 
V 

if (endgaps) { 

if (xx « I) 

50 , coll |0] = delx = -(in$0 + insl); 

else 

coll|0] = delx = colOjO}- insl; 
ndelx = xx; 



} 

55 e!se{ 



coll 10) - 0; 
delx — -insO; 
ndelx = 0; 
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FIGURE 4E 

...mv 

for (py = ^eqxll), yy = 3 ; yy < = Jen) ; py + + T yy + +) { 
mis = coJOlyy-l]; 
5 >f (dna) 

mis += (Abm|*px-*A')&xbmI*py-*A'])? DMAT : DMJS; 

else 

mis _day|*px-*A)[*py-'A*); 

10 /* update penally for del in x seq; 

* favor new del over ongong del 

* ignore MAXGAP if weighting endgaps 
*/ 

>f (endgaps | I noVlylyyl < MAXGAP) { 

1 5 if (colO|yy) - insO > - dely(yy)) { 

dely|yy] = colOjyy) - (jnsO+ insl); 
nde!y|yy] = J; 

} else { 

delylyyj - = insl; 

2 0 ndely|yy]+ + ; 



) else { 



if (colO[yy) - (ins04insl) > = dely[yyj) { 

delylyy) = colO(yy) - (insO-f insl); 
2 5 ndelvlyyj = I; 



) else 

ndelylyyj + + ; 



3 0 /* update penalry for del in y seq; 

* favor new del over ongong del 
*/ 

if (endgaps (| ndetx < MAXGAP) { 

if (coII|yy-l) insO > = del*) { 

3 5 deix = colI|yy-l)- (insO + insl); 

ndelx ~ I; 

} else { 

deix - = insl ; 
ndel*+ + ; 

40 } 

}else< 

if (coU[yy-J) - (insO+ insl) > = del a) { 

del* - coll 1*7-1) - (insO + insl); 
ndeJx = ); 

4 5 } else 



} 



ndelx-f + ; 



/* pick the maximum score; we're favoring 
5 0 * mis over any del and del.x over dely 

V 



55 
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FIGURE 4F 

— DAY 

id = xx - yy 4 Jenl -J; 
if (mis > = delx && mis > = delyfyy)) 
5 co)](yyJ = mis; 

else if (delx >= delylyyj) { 

colllyy) = delx; 

ij = dx|id].ijmp; 

if (dx(id).jp.nlO) && (»dna 1 1 (ndelx > = MAXJMP 

JO xx > dx[id)Jp.x[ij)4MX) \ \ mis > dx|id).score4DJNS0)) { 

dx{id).ijmp4 4 ; 

if <++i) >= MAXJMP){ 

writejmps(id); 

j i- ji ■• s~\- 

}j = u-n jiuj.rjiifpr — v T 

dx I id) offset = offset; 

offset 4 = si zeof (struct jmp) 4 six eof( offset); 

} 

} 

dx(id}.jp.n|ij) = ndelx; 
2 0 dxlidj.jp x[ij) = **; 

dxjid) scoie - delx; 

} 

else { 

coHlyy) - delylyy); 

2 5 >j = dxlidj.ijmp; 

if (dx|id) jp n[0) && ('dna | | (ndely(yy) > - MAXJMP 

xx > dxr>d) jp x[)j]4MX) || mis > dx|id}. scored D1NS0)) { 
dx|)d).ijmp4 4 ; 

3 0 if ( 4 4 ij > - MAXJMP) { 

wniejmp5(?d); 

ij = dx|id).ijmp = 0; 

dx|id). offset = offset; 

offset 4= sizeof(strocl jmp) 4 sizeof(offset); 

35 } 

} 

dx|id] ip.nlij) = -ndclylyy); 
dxjid) jp x|jj) = xx; 
dx lid), score = delylyy); 

40 } 

if (xx = = lenO && yy < lenl) { 
/* last col 
■*/ 

if (endgaps) 

45 colliyy) -= ins04 tnsl *(lenl-yy); 

if (coll(yy) > smax) { 

smax — coJllyy); 
dmax — id; 

) 

50 } 
} 

if (endgaps && xx < lenO) 

colUyy-n-= ins04insl*(len0-xx); 
ir (coll (yy-l) > smax) { X 
55 smax = collfyy-1); 

dmax = id; 

} 

imp = coK); coIO — col!; coll = imp; 

} 

6 0 (void) free((char *)ndely); 

(void) free((char *)defy); 

(void) free((char *)col0);(void) free<(ch 3 r »)coH);} Page 4 of HWX 
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* printQ — only routine visible outside this module 

5 

* stalk: 

* getrnatQ — trace back best path,, count matches: printQ 

* pr alignO — prim alignment of described in array p[J: printO 

* dumpblockO — dump a block of lines with numbers, stars: pr'_align() 
10 * numsO — put out a number line: dumpblockO 

* putlineO — put out a line (name, fnum], seq, |num)): dumpblockO 

* starsO - -p ut 3 line of stars: dumpblockO 

* stripnameQ — sir ip any path and prefix from a seqname 



15 



^include "nw.h" 



20 



^define SPC 3 
^define PLINE 256 
^define P SPC 3 



/* maximum output line */ 

/* space between name or num and seq */ 



25 



extern 

inl 

FILE 

pjmi() 
{ 



day|26]|26); 
olen; 



/* set output line length */ 
/* output file */ 



Ix, ly, firstgap, lastgap; f* overlap */ 



print 



30 



35 



40 



45 



50 



55 



if ((fx = foper>(ofile, "w")) = = 0){ 

fprinif(stdefr," %s: can't wnie %$\ti~* , prog, ofiJe); 
cleanup(l); 

) 

fpr>ntf(fx, " < first sequence: %s (length = %d)\n", namex[0}, lenO); 
fprinif(fx, " < second sequence: %s (length = %d)\n", namex|J} T lenl); 
olen = 60; 
Ix = lenO; 
ly = lenl; 

fir st gap = .Iasfg3p = 0; 

if (dmax < lenl - 1) { /* leading gap in x V 

pp(0).spc = firstgap = Jenl - dmax - 1; 
ly -= pp[0J spc; 

) 

else if (dmax > lenl - J) { /* leading gap in y */ 
pp[l].5pc = firstgap = dmax - (lenl - I); 
Ix -= ppllj spc; 

} 

if (dmaxO < lenO - 1) { /* trailing gap in x */ 
lastgap = lenO - dmaxO - 1 ; 
Ix - = lastgap; 

> 

else if (dmaxO > IenO-J){ /* trailing gap in y */ 
lastgap = dmaxO - (lenO - I); 
ly -= lastgap; 

} 

geimat(Ix, Iy t firstgap, lastgap); 
pralignQ; 



60 
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/* 

* irace back the best path, count matches 
*/ 

5 sialic 

getmat(lx, ly, firsigap, fastgap) get mat 

in! Ix, ly; /* "core" (minus endgaps) */ 

inl firsigap, lastgap; /* leading trailing oveilap V 

{ 

10 int nm, iO, il, sjzO, S12I; 

char out x 1 32]; 

double pci; 
register nO, nl; 

register char *pO,*pl; 

35 

/* gel total matches, score 
V 

iO = il = s'tzO = sizl - 0; 
pO = seqx|0] + pplH-spc; 
2 0 pi = seqx|l] + pp|0).spc; 

nO = ppllj spc + 1; 
n! = pp[0).spc + J; 

nm = 0; 

2 5 white ( *p0 & A *pl ) { 

if (sizO) { 

pl++; 
nl + +; 

sizO--; 

30 } 

else if (sizl) { 

p0+ +; 



45 



n0+ +; 
sizl--; 



35 } 

else { 



if (xbm|*pO-'A')&xbm|*pl-'A'J) 

nm + +; 
if (n0 + + = = pplO).x(iO)) 
4 0 sizO = pp|0).n|K)+ +]; 

if <nl + -t == pp[l).x|il)) 

sizl = pp(l)n(il + +); 

p0+ + ; 
pi + +; 



/* pel homology: 

* if penalizing endgaps, base is the shorter seq 
5 0 * else, knock off overhangs and lake shorter core 

V 

if (endgaps) 

Ix = (lenO < lenl)? lenO : Jen I; 

else 

5 5 !x = (h < ly)? lx : Jy; 

pel — IOO.*(doubie)nm/(double)lx; 
fprinlf(fx, "\n"); 

fprimf(fx, " < %d match %s in an overlap of %d: %.2f percent similarity \n" 
nm, (nm 1)? *"" : "es", Ix, pel); 

60 
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fprintf(fx, " <g3ps in firs! sequence: %d", gapx); 
if (gapx) { 

(void) sprmtf(outx, " (%d %s%s)\ 

ngspx, (dna)? "'base":"re5jcrue r \ (ngapx == J)? 

fprintf{f^,* T %s", outx); 



---getmat 



fpTintf(fx, gaps in second sequence: %d", gapy); 

if (gapy) { 

(void) sprint f(outx, " <%d %s%s)" t 

ngapy, (dna)? "base" ."residue", (ngapy =~ 1)? "":"s"); 
fprint^fx," %s",. ouix); 



if (dna) 



else 



fpiimf(fx T 

"\n< score: %d (match = %d, mismatch = %d, gap penalty = %d + %d per ba$e)\n" 
smaj, DMAT, DM IS, DJNSO, DINS)); 



fprint f(fx, 

"\n< score: %d (Dayboff PAM 250 matrix, gap penalty = %d 4 %d per residue)\n" 
smax, PINSO, P1NSI); 
if (endg3ps) 

fpiim f( fx, 

** < endgaps penalized, left endgap: %d %s%s, right endgap: %d %s%s\n", 
firstgap, (dna)? "base" : "residue", (Hrstgap = = I)? : "s", 
lastgap, (dna)? "base" : "residue", (lasigap = = J>? : "s"); 



else 



fprint f{ fx. " < endgaps not penaltzed\n"); 



35 



40 



45 



static 


nm; 


/* marches in core for checking V 


sialic 


Imax; 


/* lengths of stripped file names */ 


sialic 


ij|2}; 


/* jmp index for a path */ 


sialic 


nc|2); 


/* number at stan of current line V 


static 


ru|2J; 


/* current e!em number -- for gapping */ 


static 


siz|2); 




static char 


>12); 


/* ptr to current element */ 


static char 


•po|2); 


/* ptr to next output char slot */ 


static chax 


oui|2}[P LIKE]; 


/* output line */ 


static char 


stai[P LINE); 


/* set by starsO V 


/* 

* print alignment 
*/ 


of described in struct path ppQ 


static 






praiignQ 







pr align 



50 



55 



mt 
int 

register 



nn; 
more; 



/* char count V 



for (i = 0, Imax = 0; i < 2; i+ +) { 
nn = 5tripname(namex(i)); 
if (nn > Imax) 

Imax = nn; 



60 



J; 



1- 



nc[ij 
ni[i) = 
siz|i) = ij(i) = 0; 
psii} = seqxli}; 
po|i) = Out{i); 



Page 3 of nwprint.c 



FIGURE 4J 



10 



for (nn — nm = 0, more — I; more; ) { 

for (i = more = 0; i < 2; i + +) { 
/* 

* do we have more of this sequence? 
*/ 

continue; 
more+ + ; 



-..pr align 



15 



20 



" \vvi'3-~*r^/ i ' * £> ' 

*po!i] + + - * *; 

pplij.spc-; 

} 

else if (sizf i)) { /* in a gap */ 
*poli]++ = *-*; 



} 

else { 



/* we're pulling a seq element 



25 



*p6Ii] = *pslij; 
if (islowei(*ps|i])) 

*ps|i) = »oupper(*psli]); 

po|i)+ + ; 



30 



35 



40 



45 



50 



} 



* are we ai next gap for (his seq? 
*/ 

if(niii) " PPl'lMul'DX 
/ 4 

* we need to merge all gaps 

* at this location 

sizti) -pp()J n|ij|i]++); 
while (ni[i] = = PPlil-Mijl']]) 

siz|i] += pp[i]-n|v!i}++); 

} 

ni{i]+ +; 



} 

if (+ + nn = = oleri 1 1 Jmore && nn) { 
dumpblockO; 
Tor (i = 0; i < 2; i+ +) 
po|i] = outjij; 

nn = 0; 

} 



55 



60 



/* . 

* dump a block of lines, including numbers, stars: pr_a)ign0 
*/ 

static 

dumpblocj<0 

{ 

register i; 

for (i = 0; i < 2; i-f +) 
*po|i] - = -\0*; 



dumpblock 
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15 



(void) putc(*\n\ fx); 

for (i = 0; i < 2; i++) { 

if (*oui|iJ && {*ov\\~i} ! = 
if (i = « 0) 



II *(po[i])!= * ')){ 



nums(i); 
if \t 0&& *oui|IJ) 

putline(i); 

if (i = = o&& *out|lJ) 

fprintf(fx,. siai); 
if 0 « i) 

nums(i); 



-dumpblock 



20 



25 



30 



35 



40 



45 
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* pur out 2 number line: dumpblocfcO 
V 
static 

nums(ix) 

>nl jx; V* index in outQ holding seq line */ 

{ 

char nJine(P LINE]; 

register i, j; 

register char *pn, *px, *py; 



for (pn - nJine, t = 0; i < Imax + P SPC; i + + r pn4 +) 
*pn = ' *; 

for (i = nc|ix], py = out fix); *py; py + -f pn 4- -f ) { 
if ( * py == ■ • || * py •_•) 
*pn = ' '; 

else { 

if (,%]0 = = 0 || (i == I nc[ix] ! = 
j = (i < 0)? -i : i; 
for (px = pn; j; j / = 10, px-) 
*px = j%J0 4 '0'; 

if (i < 0) 



OX 



} 

else 



T pn : 



} 

*p« = *\0*; 
ncjix] = i; 

for (pn = nline; *pn; pn + +) 
(void) putc(*pn, fx); 
(void) puic(*\n\ fx); 



nums 



* pvl oul a line (name, (ram), seq T fnum)): dumpblockQ 

v ■ 

sialic 
6 0 putjine(ix) 

inl ix; 

{ 



putline 
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5 register char *px; 

for (px = namexlixj, j = 0; *px && *px !=':"; px-f + , i+ + ) 

(void) putc(*px, fx); 
for (; i < Imax + PSPC; i + -f ) 

1 0 (void) puic(' \ fx); 

/* these count from I: 

* niQ is current element (from f) 

* ncU is number at start of current line 
15 V 

for (px — out(ix); *px; px + +) 

(void) puic(*px&0x7F, fx); 
(void) puic('\n', fx); 

} 

20 

/* 

* put a ("me of stars (seqs always in out[0), out|l]): dumpblock() 
V 

2 5 static 

starsO stars 



30 



40 



{ 



int i; 

register char *p0, *pl , c x, 4 px; 



if (Moui|0) || (*out[0J = = * ' && *(rx>|0]) == * •) j} 
!*oui|l] i| (*oui|l) = = * • &A 4 (po|l]) == ")) 
return; 

px — star; 

3 5 for <i = Imax-t-P_SPC; i; i-) 

*px + + = * '; 



for fpO = out[0), pi = oui|l]; *pO && *pl; p0+ + , pl + +) { 
if (isalpha(*pO) && isalpha(*p))) { 



if (xbmrpO-'A'J&xbmPpl-'A )) { 
cx - 
nm -t + ; 

} 

4 5 elscir(Mn3&& _day|* P 0- , A'}l*pJ-*A ) > 0) 

cx — V; 

else 

cx — * *; 

} 

50 else 

cx — * *; 
*px + + = cx; 

} 

*px++ = *\n'; 
55 *px = '\0*; 



60 
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/* 

* strip palb or prefix from pn, jeiurn len: pr ^]]gnQ 
V 

5 static 

stripnameCpn) Stripname 
char *pn; /* file name (may be path) */ 

{ 

register char *px, *pyi 

10 

py - 0; 

for (px — pn; *px; px+ +) 
>f(*px == 7) 

py - px + J; 

15 if(py) 

(void) slrcpy(pn, py); 
retunr>(sHlen(pn)); 

} 

20 



25 



30 



35 



40 



45 



50 



55 
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/* 

* cleanupO - cleanup any imp file 

* getseq() - read in seq, set dna, len, maxlen 

* g_ca!!oc() - cailocQ wilh error cbeckin 

* readjmps() ~ gel ibe good jmps, from imp Hie if necessary 

* wiitejmpsO - write a Titled array of jmps 10 a imp Hie: nwQ 
*/ ■ 

^include "mv.h" 
^include <sys/file.h> 

char *jname = u /tmp/homgXXXXXX"; /* imp file for jmps */ 

FILE *0; 

iol cleanupO; /* cleanup Imp file */ 

long I$eek(); 

/* 

* remove any imp Tile if we blow 
*/ 

^leanupt.) cleanup 

ml i; 

»r trj) 

(void) unjinjc(jname); 

cxit(i); 

J 

/* 

* read, return pir to seq, set dna, len. maxlen 

* skip fines starting wiih ' < Oi "> ' 

* seq in upper or lower case 
*/ 

r har * 

gciseq<file, len) getseq 
char * file; /* file name */ 
»"■< *len; /* seq len */ 

i 

fhar iine[1024), *pseq; 

register char *pa, *py; 

«n* natgc, lien; 

FILE *fp ; 



if ((fp = fopen(fiie p "r")) = = 0) { 

nprintf(stderr > "%s: can'i read %s\rT, prog, file); 
exii(J), 

) 

lien - nalgc = 0; 

while (fgets{)ine, 1024, fp)) { 

if (*line = = j| Mine == "<* || Mine = = '>') 

conlinue; 
for (px = line; *px != '\n*; px + +) 

if (isupper(*px) J | islower(*px)) 
tlen+ + ; 

} 

if ((pseq = maIIoc((uns»gn€d)(ilen + 6))) = = 0) { 

fprinif(siderr,"%s: mallocO failed to get %d bytes for %s\iT, prog, ilen + 6, Hie); 
exii(l); 

} 

pseqlQ} = pseqil) = pseq(2} = pseq(3] = '\0*; 
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py — P sec i + 4\ 

*Jeo = lien; 
jewind(fp); 



...gelseq 



10 



15 



20 



>vhi!e (fgets(Iine, ) 024, fp)) { 

if (Nine == *;* j} Mine == ' < ' }| Mine = - > ') 

continue; 
for (px = line; *px ?= *\n*; px + -f ) { 
if(isupper{*px)) 

*py + + = *px; 
else if (is!ower(*px» 

'py-t-4 = toupper(*px); 
»r (indexr ATCOJ V(py- i))) 
natgc + + ; 

) 

} 

»py+ -i = AO': 

* P y = AO': 

(void) fclose(fp); 

dna = naigc > (ilen/3); 

re1orn(pseq4 A)\ 



25 
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char 

g_ca)loc(msg, ax, sz) 

char *rnsg; 
int 



{ 



char 



ax, sz; 



/* program, calling routine */ 
/* number and size of elements */ 



*px, *caIIoc(); 



g^calJoc 



if ((px = caMoc((unsigned)nx, (unsigned)sz)) = = 0) { 
if (*msg) { 

fprimf(stdeir ? " %s: gcallocO failed %s {n- %d, sz=%d)\n", prog, rosg, nx, sz); 
exii(I); 

) 

) 

relurn(px); 



50 



55 



60 



* get final jmps from dx[J or imp Hie, set pp[J r reset dmax: mainO 
V 



4 5 leadjmpsO 



rcadjmps 



int 
int 

regisler i, j, xx; 



f d = - 1 ; 
siz T iO, il; 



iM0){ 



} 

for (i 



(void) fclose(fj); 

if ((fd - openOname, O RDONLY, 0)) < 0) { 

fprinif(s(deri f "%s: cant openO %sAn", prog, jname); 
cleanup(I); 

) 

iO = il = 0, dmaxO = dmax, xx = knO; ; i+ + ) { 
while (1) { 

for (j — dx|dmax].ijmp; j > — 0 && dxldmaxj.jp.xljj > = xx; j— ) 

; Page 2 of nwsubr.c 
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-..readjmps 

if (j < 0 & & dx I dmax}. offset && fj) { 

(void) lse<rk(fd, dx|dmaxj. offset, 0); 
5 (void) read(fd, (char *)&dx|dmax] jp, sizeof (struct jmp)); 

(void) jead(fd, (char *)&dx[dmax}. offset, s>zeof(dx(dmax].offseO); 
dx{dmaxj.ijmp = MAXJMP-J; 

) 

else 

10 break; 
} 

if (i > = JMPS) { 

fprinif(slderr, " %s: loo many gaps in aJignmentAn", prog); 
cleanups); 

15 } 

if 0 > = 0) { 

siz = dx|dmax) jp n(j); 
xx = dx|drT>3x).jp.x[j); 
dmax -f - siz; 

2 0 if (siz < 0) { /* gap in second seq */ 

ppJIJ n|i)J = -siz; 
x,x f - siz, 

/* id = xjt - yy + lenl - 1 
25 V 

pp|]].x|il) = x_x - dmax 4 Jen I - J; 
gapy + 4 , 
ngapy - = siz; 
/* ignore MAXGAP when doing endgaps */ 

3 0 siz = ( siz < MAXGAP |{ endgaps)? -siz : MAXGAP; 

i I 4 4 ; 

) 

else if (siz > 0) { /* gap in first seq */ 
pp!0J.n| i0) = siz, 

3 5 pp|0].x|i0J = xa. 

g3px4 4 ; 

ngapx 4 — siz; 
/* ignore MAXGAP when doing endgaps */ 

stz = (siz < MAXGAP | | endgaps)? siz : MAXGAP; 

4 0 K>4 4, 

} 

) 

else 

breaX; 

45 } 

/* reverse the order of jmps 
*/ 

for (J = 0, iO -; j < iO; j 4 4 t iO -) { 
50 i = PPlOJ n[j} ; ppIOj.nlJ] = ppiO].n{iO); pp(OJ.n|iO] = i; 

i = "pp|0).xijj; pp{0) xfcj - ppiO}.x|iO); pp|0).x|iO) = i; 

} 

for 0 = 0, il — ; J < U; j4 4, ij--) { 

i = PPM) "Li); PPlD-nlj) = PPM) n(il); pplU-nlil) = i; 
55 i = ppHJ xU}; pp|lj xU] = pp|I) -Mil); pp|l].x|ilj = i; 

) 

if (fd > - 0) 

(void) close(fd); 

* <0)< 

60 (void) unJinI<(jname); 

0 = 0; 

offset = 0;}} Page 3 of nwsubr.c 
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/* 

* write a filled jmp struct offset of the piev one (if any): n\v() 
*/ 

wiitejmps(ix) wrilejmps 
int ix, 

{ 

char *mJciempO; 
if V f)){ 



if (mVicmp(jname) < 0) { 

fpiimf(stderr, ~%s: can't mJciempO %s\n" , prog, jname); 
cleanup(I); 

15 } 

if (<fj - fopenfjname, " W)) = = 0) { 

fprinif(siden, "%s: can't wiite %$\n n , piog, jname); 
exil(J); 

} 

20 } 

(void) fwirte({char *)&dx|ixj.jp, sizeof(s1rott jmp). I, fj); 
(void) fwjiie((rhar *)<5rdx|ix) offset, sizeof(dx|ix) offset), I, fj); 



25 
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5 

GTGCTCTCCGAGGACAAGCAGGAGGNGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTG 
GAAGTGTGCTACATCTCAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTCCTGATG 
CGCTCACTGGTGACACACAGGACCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGAC 
TTGAGTCCCTTGCATCGGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGC 

10 TTCAGTGCCTACCAGACAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTC 
TTCCTGGGAACCACGGCCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAAC 
CTCCTGCTCTTCCGTTCCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCT 
GTGATCCTGCAGAACATGGCAGCCCATTGGGTCTTCCTGGAGACTCATGATGGACACCCA 
CAGCTGACCAACCGGCGAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCCCTCAATGTG 

1 5 CTGGTGGGTGCCATGGTGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGCCATC 
CACCTTGGCCAGATGGACCTCAGCCTGCTGCCACCGAGAGCCGCCACTCTCGACCCCGGC 
TACTACACGTACCGAA 



FIGURE 6 



CACAACCAGCCACCCCTCTAGGATCCCAGCCCAGCTGGTGCTGGGCTCAGAGGAGAAGGC 
5 CCCGTGTTGGGAGCACCCTGCTTGCCTGGAGGGACAAGTTTCCGGGAGAGATCAATAAAG 
GAAAGGAAAGAGACAAGGAAGGGAGAGGTCAGGAGAGCGCTTGATTGGAGGAGAAGGGCC 
AGAGA ATG TCGTCCCAGCCAGCAGGGAACCAGACCTCCCCCGGGGCCACAGAGGACTACT 
CCTATGGCAGCTGGTACATCGATGAGCCCCAGGGGGGCGAGGAGCTCCAGCCAGAGGGGG 
AAGTGCCCTCCTGCCACACCAGCATACCACCCGGCCTGTACCACGCCTGCCTGGCCTCGC 
10 TGTCAATCCTTGTGCTGCTGCTCCTGGCCATGCTGGTGAGGCGCCGCCAGCTCTGGCCTG 
ACTGTGTGCGTGGCAGGCCCGGGCTGCCCAGGCCCCGGGCAGTGCCTGCTGCTGTTTTCA 
TGGTCCT'CCTGAGCTCCCTGTGTTTGCTGCTCCCCGACGAGGACGCATTGCCCTTCCTGA 
CTCTCGCCTCAGCACCCAGCCAAGATGGGAAAACTGAGGCTCCAAGAGGGGCCTGGAAGA 
TACTGGGACTGTTCTATTATGCTGCCCTCTACTACCCTCTGGCTGCCTGTGCCACGGCTG 

1 5 - GCCACACAGCTGCACACCTGCTCGGGAGCACGCTGTCCTGGGCCCACCTTGGGGTCCAGG 

TCTGGCAGAGGGCAGAGTGTCCCCAGGTGCCCAAGATCTACAAGTACTACTCCCTGCTGG 
CCTCCCTGCCTCTCCTGCTGGGCCTCGGATTCCTGAGCCTTTGGTACCCTGTGCAGCTGG 
TGAGAAGCTTCAGCCGTAGGACAGGAGCAGGCTCCAAGGGGCTGCAGAGCAGCTACTCTG 
AGGAATATCTGAGGAACCTCCTTTGCAGGAAGAAGCTGGGAAGCAGCTACCACACCTCCA 

2 0 AGCATGGCTTCCTGTCCTGGGCCCGCGTCTGCTTGAGACACTGCATCTACACTCCACAGC 

CAGGATTCCATCTCCCGCTGAAGCTGGTGCTTTGAGCTACACTGACAGGGACGGCCATTT 
ACCAGGTGGCCCTGCTGCTGCTGGTGGGCGTGGTACCCACTATCCAGAAGGTGAGGGCAG 
GGGTCACCACGGATGTCTCCTACCTGCTGGCCGGCTTTGGAATCGTGCTCTCCGAGGACA 
AGCAGGAGGTGGTGGAGCTGGTGAAGCACCATCTGTGGGCTCTGGAAGTGTGCTACATCT 

2 5 CAGCCTTGGTCTTGTCCTGCTTACTCACCTTCCTGGTCCTGATGCGCTCACTGGTGACAC 

ACAGGACCAACCTTCGAGCTCTGCACCGAGGAGCTGCCCTGGACTTGAGTCCCTTGCATC 
GGAGTCCCCATCCCTCCCGCCAAGCCATATTCTGTTGGATGAGCTTCAGTGCCTACCAGA 
CAGCCTTTATCTGCCTTGGGCTCCTGGTGCAGCAGATCATCTTCTTCCTGGGAACCACGG 
CCCTGGCCTTCCTGGTGCTCATGCCTGTGCTCCATGGCAGGAACCTCCTGCTCTTCCGTT 

3 0 CCCTGGAGTCCTCGTGGCCCTTCTGGCTGACTTTGGCCCTGGCTGTGATCCTGCAGAACA 

TGGCAGCGCATTGGGTCTTCCTGGAGACTCATGATGGAGACCCACAGCTGACCAACCGGC 
GAGTGCTCTATGCAGCCACCTTTCTTCTCTTCCCCCTCAATGTGCTGGTGGGTGCCATAG 
TGGCCACCTGGCGAGTGCTCCTCTCTGCCCTCTACAACGGGATCCACCTTGGCCAGATGG 
ACCTCAGCCTGCTGCCACCGAGAGCCGCCACTCTCGACCCCGGCTACTACACGTACCGAA 

3 5 ACTTCTTGAAGATTGAAGTCAGCCAGTCGCATCCAGCCATGACAGCCTTCTGCTCCCTGC 

TCCTGCAAGCGCAGAGCCTCCTACCCAGGACCATGGCAGCCCCCCAGGAGAGCCTCAGAC 
GAGGGGAGGAAGACGAAGGGATGCAGGTGCTACAGACAAAGGACTCCATGGCCAAGGGAG 
CTAGGCCCGGGGCCAGCCGCGGCAGGGCTCGCTGGGGTCTGGCCTACACGCTGCTGCACA 
ACCCAACCCTGCAGGTCTTCCGCAAGACGGGCCTGTTGGGTGCCAATGGTGCCCAGCCCT 

4 0 GAGGGGAGGGAAGGTCAACCCACCTGCCCATCTGTGGTGAGGCATGTTCCTGCCTACCAC 

CTCCTCCCTCCCCGGCTCTCCTCCCAGCATCACACCAGCCATGCAGCCAGCAGGTCCTCC 
GGATCACTGTGGTTGGGTGGAGGTCTGTCTGCACTGGGAGCCTCAGGAGGGCTCTGCTCC 
ACCCACTTGGCTATGGGAGAGCCAGCAGGGGTTCTGGAGAAAGAAACTGGTGGGTTAGGG 
CCTTGGTCCAGGAGCCAGTTGAGCCAGGGCAGCGACATCCAGGCGTCTCCCTACCCTGGC 

4 5 TCTGCCATCAGCCTTGAAGGGCCTCGATGAAGCCTTCTCTGGAACCACTCCAGCCCAGCT 
CCACCTCAGCCTTGGCCTTGACGCTGTGGAAGCAGCCAAGGCACTTCCTCACCGCCTCAG 
CGCCACGGACGTCTCTGGGGAGTGGCCGGAAAGCTCGCGGGCGTCTGGCCTGCAGGGCAG 
CCGAAGTCATGACTCAGACCAGGTCCCACACTGAGCTGCCCACACTCGAGAGCCAGATAT 
TTTTGTAGTTTTTATGCCTTTGGCTATTATGAAAGAGGTTAGTGTGTTGCCTGCAATAAA 

50 CTTGTTCGTGAGAAAAA 



FIGURE 7 



MSSQPAGNQTSPGATEDYSYGSViYIDEPQGGEELQPEGEVPSCHTSIPPGLYHACLASL 
SILVLLLLAMLVRRRQLWPDCVRGRPGLPRPRAVPAAVFMVLLSSLCLLLPDEDALPFL 
5 TLASAPSQDGKTEAPRGAWKILGLFYYAALYYPLAACATAGHTAAHLLGSTLSWAHLGV 
QVWQRAECPQVPKIYKYYSLLASLPLLLGLGFLSLWY PVQLVRSFSRRTGAGSKGLQSS 
YSEEYLRNLLCRKKLGSSYHTSKHGFLSWARVCLRHCI YTPQPGFHLPLKLVLSATLTG 
TAI-YQVALLLLVGVVPTIQKVRAGVTTDVSYLLAGFGIVLSEDkQEVVELVKHHLWALE 
VCYISALVLSCLLTFLVLMRSLVTHRTNLRALHRGAALDLSPLHRSPHPSRQAIFCWMS 
10 FSAYQTAFICLGLLVQQIIFFLGTTALAFLVLMPVLHGRNLLLFRSLESSWPFWLTLAL 
AVI LQNMAAHWV FLETHDGHPQLTNRRVLYAATFLLFPLNVLVGA3 VATWRVLLSALYN 
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Important features of the protein: 

Signal peptide: 
none 

2 0 

Transmembrane doma i n : 
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425-444 
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Motif name: N-g 1 ycosy 3 a t i on site. 
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Motif name: N-my ri s t oy 3 at i on site. 
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Motif name: Prokaryotic membrane lipoprotein lipid attachment 
50 site. 
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Motif name: ATP/GTP-bi nding site motif A (P-loop). 
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Figure 26 
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